Performance Criteria for Graph Clustering and Markov Cluster Experiments

نویسندگان

  • S. van Dongen
  • Stijn van Dongen
چکیده

In [6] a cluster algorithm for graphs was introduced called the Markov cluster algorithm or MCL algorithm. The algorithm is based on simulation of (stochastic) ow in graphs by means of alternation of two operators, expansion and in ation. The results in [8] establish an intrinsic relationship between the corresponding algebraic process (MCL process) and cluster structure in the iterands and the limits of the process. Several kinds of experiments conducted with the MCL algorithm are described here. Test cases with varying homogeneity characteristics are used to establish some of the particular strengths and weaknesses of the algorithm. In general the algorithm performs well, except for graphs which are very homogeneous (such as weakly connected grids) and for which the natural cluster diameter (i.e. the diameter of a subgraph induced by a natural cluster) is large. This can be understood in terms of the ow characteristics of the MCL algorithm and the heuristic on which the algorithm is grounded. A generic performance criterion for clusterings of weighted graphs is derived, by a stepwise re nement of a simple and appealing criterion for simple graphs. The most re ned criterion uses a particular Schur convex function, several properties of which are established. A metric is de ned on the space of partitions, which is useful for comparing di erent clusterings of the same graph. The metric is compared with the metric known as the equivalence mismatch coe cient. The performance criterion and the metric are used for the quantitative measurement of experiments conducted with the MCL algorithm on randomly generated test graphs with 10000 nodes. Scaling the MCL algorithm requires a regime of pruning the stochastic matrices which need to be computed. The e ect of pruning on the quality of the retrieved clusterings is also investigated. 2000 Mathematics Subject Classi cation: 05A18, 05B20, 15A48, 15A51, 62H30, 68R10, 68T10, 90C35.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cluster-Based Image Segmentation Using Fuzzy Markov Random Field

Image segmentation is an important task in image processing and computer vision which attract many researchers attention. There are a couple of information sets pixels in an image: statistical and structural information which refer to the feature value of pixel data and local correlation of pixel data, respectively. Markov random field (MRF) is a tool for modeling statistical and structural inf...

متن کامل

خوشه‌بندی داده‌ها بر پایه شناسایی کلید

Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

Impact of Similarity Measures on Web-page Clustering

Clustering of web documents enables (semi-)automated categorization, and facilitates certain types of search. Any clustering method has to embed the documents in a suitable similarity space. While several clustering methods and the associated similarity measures have been proposed in the past, there is no systematic comparative study of the impact of similarity metrics on cluster quality, possi...

متن کامل

Performance Testing of RNSC and MCL Algorithms on Random Geometric Graphs

The exploration of quality clusters in complex networks is an important issue in many disciplines, which still remains a challenging task. Many graph clustering algorithms came into the field in the recent past but they were not giving satisfactory performance on the basis of robustness, optimality, etc. So, it is most difficult task to decide which one is giving more beneficial clustering resu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000